N-DIGIT BENFORD DISTRIBUTED RANDOM VARIABLES 
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C^ ' Abstract. The scope of this paper is twofold. First, to emphasize the use of the mod 1 

map in exploring the digit distribution of random variables. We show that the well-known 
base and scale invariance of Benford variables is a consequence of their associated mod 
1 density functions being uniformly distributed. Second, to introduce a new concept of 
the n-digit Benford variable. Such a variable is Benford in the first n digits, but it is not 
guaranteed to have a logarithmic distribution beyond the 71-th digit. We conclude the 
paper by giving a general construction method for n-digit Benford variables, and provide 
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(— ^ ■ a concrete example. 

cn 

Oh . 1. Introduction 

^_> ' In 1881 Newcomb pj noticed that the first digit distribution of numerical data is not 

■ ■ uniform but rather logarithmic. He had observed that the pages of logarithm tables were 

more worn out for smaller digits such as 1 and 2 than for larger ones, and concluded that 
"[the] law of probability of the occurrence of the numbers is such that all mantissae of their 
logarithms are equally probable." He explicitly tabulated the probability of occurrence of 
the first and second digits. Newcomb's work was rediscovered in 1938 by Benford [2] who 

^^ ■ explicitly gave the formula for the probability of a number having the first digit d 

P{d) = log(l + ^Y d=l,2,...,9 (1) 
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where log is used to represent the base 10 logarithm. He gathered empirical evidence for 
formula ([1]) by collecting thousands of numbers from diverse datasets, such as the area of 
the riverbeds, atomic weights of elements, etc.. 
^ Pinkham proved [1] that the only digit distribution that is left invariant under scale 

r> ■ change of the underlying distribution is the Benford distribution. Scale invariance means 

C^ , that a collection of data has the same digit distribution when multiplied by a constant. 

In his seminal paper [3], Hill showed that the appropriate domain for the significant digit 
probability is the smallest collection of positive real subsets that contains all the infinite sets 
of the form U^-00 1*^' ^) ' -'^'-'" • This set denoted by A, is the smallest sigma algebra generated 
by L>i, D2, . . . , where Di is the i-th significant-digit function. Di : M"*" — )• {1, . . . , 9} and 
Di : R+ ^ {0,1, . . . ,9} ior i ^ I. For example L»i(2.718) = 2 and 1)2(2.718) = 7. Observe 
that D^ {d) € A for all i and d 0. Within this framework, for a random variable Y, the 
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Benford's first digit distribution law can be stated as 

PiD,{Y) = d) = log{l + d-'). (2) 

In general [6], a random variable is Benford if for all m € N, all di € {1, . . . , 9} and all 
dj€{0,l,...,9} 

l+(^^10--id,) ). (3) 

For example, the probability of having digits 8, 4, and 7 as the first, second, and third 
significant digits respectively, is 

P(8, 4, 7) = log (l + ^\^ 0.000512 . (4) 

Hill defined base invariance and proved that base invariance as well as scale invariance, 
both imply Benford distribution of digits. Leemis et. al [5] investigated several examples 
of symmetric and non-symmetric distributions that lead to Benford distributed random 
variables. 

This paper is organized as follows. In section [2] we use the mod 1 map to show that 
the well-known base and scale invariance of Benford distributed random variables is a 
consequence of g' = 1, where 5' is the associated mod 1 density function. In section [3] we 
introduce the concept of n-digit Benford distributed random variables which are guaranteed 
to obey the log distribution in their first n digits, and give a general construction method 
for such variables. Unless otherwise specified, throughout this paper we assume that the 
base is 10. 

2. The mod 1 map 

Any positive real number y can be written as y = m x 10 for some /c S Z, where 
1 < ?n, < 10. Then the first digits of y and m are the same: -Di(y) = -Di(m). Let's assume 
that y is given by a random variable Y with the density function /. Let X = logY be the 
random variable with the density function g. 

Definition 2.1. For any real function 51 : M ^> M , we define g' = g (mod 1) as 

,g{x + k) Vxe[0, 1) 
otherwise 

Lemma 2.2. The probability ofY having its first digit d is 

r-log{d+l) 




/■iog(,a+i; 

P{Di{Y) = d)= g\x)dx . 

Jloald) 



log(d) 

Proof. Let us consider the real numbers starting with the digit d, i.e., Di{y) = d. These 
numbers belong to the set S = [J'^^_^[d x 10 , (d+l) x 10 ). Now consider the random 
variable X = logY with the density function g. The logarithmic function maps S into 
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\J^=-oo[^^Sd + k, log{d+ 1) + k). This set modulo 1 is just [logd, Iog(d+ 1)) as illustrated 
in figure ([1]) for the case d = 2. 

< H // [ } // [ ) // [ ^^ 

0.2 0.3 2 3 20 30 200 300 

log 

< [ ) // [ ) // [ ) // [ ] ► 

log(2)-l log(3)-l log{2) log(3) ^ log(2)+l log(3)+l log(2) +2 log(3)+2 

mod 1 

E [ ] ) 

log{2) log(3) 1 

Figure 1. The modulo 1 set of logarithms of positive numbers starting with 2. 

Consequently 

P {Di{Y) = d)= Y^ P(dxl{)^ <Y <{d+l)-x 10^) 

fc=— oo 

oo 

= Y^ P{\ogd + k<X <\og{d+l) + k) 

k=—oo 

= P (log d < X(mod 1) < log((i + 1)) 

log{d+l) 

g^{x) dx . 

logd 

D 

Clearly if g' = 1, one obtains Benford's law P {Di{Y) = d) = log(l + 1/d). The unifor- 
mity of g^ resonates with Newcomb's pioneering observation in 1881 [I] that the "proba- 
bility of the occurrence of the numbers is such that all mantissae of their logarithms are 
equally probable." 

Lemma [2^2] can be readily generalized for a sequence of prescribed digits di,d2, ■ ■ ■ ,dn 
where di e {1,2,. . . , 9} and di e {0,1,... , 9} for i > 1: 

P(D,{Y) = duD2iY) = d2,...,Dn{Y) = dn)= g\x)dx. (5) 



10 ' ' 10" 



2.1. Scale Invariance, Base Invariance, and the mod 1 map. Scale invariance means 
that a collection of data keeps its digit distribution when multiplied by a constant. For 
example, suppose that the prices of goods are Benford distributed, then scale invariance 
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implies that these prices remain Benford distributed regardless of the currency in which 
they are converted. 

Lemma 2.3. Scaling of a random variable Y is equivalent to a translation of X = logY. 

Proof Let X be a random variable with the density function g and let Xi be the random 
variable generated by the translation of X by t units, i.e.; Xi = X + t and gi{x) = g{x~t), 
where gi is the density function of Xi. Define Y = 10 and Yi = 10^ and let / and /i 
be their corresponding density functions. In terms of cumulative distribution functions we 
have 



F,{y) = Gi(logy) = G(logy - t) = G(log -^) = F(^) 



The converse is immediate. 



D 



Due to modular arithmetics, a translation of g will result in a wrap-around effect in g\ 
Hence, scaling of Y induces a wrap-around of gK For example, let X = log Y be the random 
variable with the density function g = Triangle(0, |,3), and let Xi be its translation by t. 
The effect of scaling of Y on g^ is shown in figure ([2]) . 
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Figure 2. The wrap-around effect on g^ of a translation by t=0.65 of the 
<7 = Triangle(0,|,3). 

By lemma ([22]) we observe that P {Di{Y) = d) ^ P {DiiYi) = d) i.e. the first digit dis- 
tributions of Y and Yi are not the same, indicating that Y is not scale invariant. One can 
see that only a uniform g^ remains unchanged under the wrap-around effect induced by a 
scaling of Y . 

Theorem 2.4. Only the random variables characterized by g^ = 1 are scale invariant. 

Proof. By lemma (j2.3p scaling of Y yields a wrap-around effect of g' . For a Benford distri- 
bution, we want the same areas under the g'^ and gl over the intervals [0, log 2), [log 2, log 3), 
. . . , [log 9, 1]. Here gl denotes the mod 1 projection of gi which is the shifted g. Clearly 
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a uniform g^ will satisfy this constraint. Any other function when translated, will not 
simultaneously keep the areas under g'^ preserved over all of these intervals. Thus the only 
g'^ left unchanged is the uniform one. D 

We are now arriving at two important results first proved by Hill [3] concerning scale 
and base invariance. 

Corollary 2.5. Scale invariance implies Benford's law. 

Proof. In order to have scale invariance, we must have g^ = 1. From lemma ()2.2p one 
obtains Benford's law. D 



Base invariance means that for any base b, the probability of having d as the first digit 



IS 



Pb{Di(Y) = d) = log, (1 + l/d) , d=l,...,b-l. 

Corollary 2.6. Scale invariance implies base invariance. 

Proof. Scale invariance demands g'^ = 1. Then, for an arbitrary base b, integrating over 
the logarithmic intervals [log, d, logj((i + 1)) we get Benford's distribution 

P(Z)i(y)=d)=log,(l + l/d) , d€{l,...,fe-l}. (6) 

D 

Using the insight gained by the mod 1 map analysis we can construct an infinite class 
of Benford distributed random variables with non-compact support. To this end, let us 
consider X with the following density function 



oo ^ 



9{x}= } ^ — [n{x-n)-n{x-n- i)\, (7) 

as illustrated in figure El Here H{x) is the Heaviside step function. Since the series 
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Figure 3. Example of g with non-compact support yielding uniform r^' . 

Xlriii 2" ~ ^' °^^ obtains g' = 1 and thus, Y = 10 is Benford distributed. A similar 
construction can be used for any convergent series. 
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3. n-DIGIT BeNFORD VARIABLES 

We define a random variable that has a logarithmic distribution in its first n digits to be 
an n-digit Benford variable. More precisely, such a variable is Benford in the first n digits, 
but it is not guaranteed to have a logarithmic distribution beyond the n-th digit. 

Definition 3.1. Let n S N. A random variable is n-digit Benford if for all di G {1, . . . , 9} 

and all dj G {0, 1, . . . , 9}, 2 < j < n 

1 



P[Di{Y) = di,...,Dn{Y) = dn] =log 1 



(8) 



10"-i(ii + 10"-2d2 + • • • + d„ 
To show that such variables exist, we use ([5]) to construct a g'^ that satisfies ^ for any 



n E N. That is, we search for g'^ such that 

dn + l 
10" , / 

g' (x) dx = log 1 + 



rfi 



i2. 



(9) 



10^ ^10" ^ 

We proceed by the partition {log 1, log 2, . . . , log 10} of the [0, 1) interval. Then, an example 
of g^ which yields a 1-digit Benford distribution is given by 

x—logk 



gH^) 



2S™l^log{l+l/fc) 



logk <x < log{k + 1) , k = !,■■■ ,9 



[ otherwise 

as illustrated in figure HI One can easily check that fi°(^) qH^) dx = log(l + 1/d). 
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Figure 4. Example of 5' yielding a 1-digit Benford random variable. 

By theorem ()2.4p any density function g whose g"^ is non- uniform is not scale invariant. 
Thus the example of equation (jlOp while yielding a 1-digit Benford distributed variable, is 
not scale nor base invariant. 

Following this idea, one can generalize the example given in (llOp in order to produce a 
g^ which satisfy ([9]) with an arbitrary large n. For such a construction let 

= ao < ai < • • • < Om-i < a-m = 1 , rn = 10" 
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be a partition of the interval [0, 1). Let hj : [0, 1] — > [0, oo) be a probability density function. 
For X £ [0, 1) such that Uj < x < aj+i, let us define 



g^ = h,^-^^). (11) 



It is easy to see that for all j, 



aj+i 

g^{x) dx = aj+i 



Therefore we have that g'^ behaves as the uniform density {g{x) = 1 for all < x < 1) 
when integrated over intervals of the form [ojt, a^) with < fc < ^ < n. 

4. Conclusions 

In this paper we demonstrated the use of mod 1 map in analyzing the digit distribution 
of random variables. In particular we have shown that a uniform g^ implies scale and 
base invariance. We have introduced the concept of n-digit Benford, and gave a concrete 
example of a 1-digit Benford variable. Furthermore, given any density function, we have 
shown how to use the mod 1 map to construct an example of a n-digit Benford variable. 
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